Sunday 23 November 2008

Cheer leading for creative writers

new month, new job

I now work for Oracle based in Thames Valley Park, Reading in the UK.

I am usually the last to hear any news about corporate takeovers.

Does this mean I will have an Oracle sanctioned blog named 'Siebel Evangelist' ? Probably not.

out of the closet

My name is Norman Brightside. I work for Siebel Systems based in Egham in the UK. I am a Senior Architecture Specialist in Expert Services. Expert Services provide a range of consulting services from architecture workshops to performance troubleshooting. My work is technical (usually includes Oracle), varied, includes travel to Siebel customers located within the UK and Europe and perhaps, most importantly, involves meeting interesting people and learning.

how bind variables made me a demigod

In a previous life, I was a development DBA. Sorry that's not quite true. My job title was 'Persistence Architect' for a J2EE application. The Oracle database wasn't really considered to be a database. In fact, it wasn't even considered to be a repository either. The database was merely a means of persisting objects. There was a swear box for the terms 'Table', 'Column', 'Database', 'record, 'schema', 'SELECT' and bizarrely, 'PMON'. This meant I had to resurrect the word 'tuple'. Anyway, I was responsible for the production system which was lightly loaded and adequately configured so life was easy and I could blog and surf all day. Well, actually I couldn't because I didn't blog back then and I was behind a corporate firewall so I just had some EJB's and the complete Oracle 9i documentation set for company. One Friday, a developer came along and asked me to restart the DEV database. I asked 'Why ?'. He replied 'Because it is the second Friday in the month'. I asked 'Why ?'. 'Because we restart it every two weeks at Friday lunchtime'. 'Why ?'. 'Because that's what Bryan used to do' 'Why ?'. 'To fix the Oracle bug where we can't instantiate any more objects'. 'What Oracle bug ?'. 'Look - I really don't know - some stupid low level Oracle error. Please just restart it'. Ten minutes investigations revealed he was indeed speaking the truth. The Oracle listener was failing with 'Unable to fork process' due to a lack of resources. Just like the objects, database connections were also persisted. This was a development Solaris server with lots of components co-hosted with 2 CPU, 512 MB memory and a paltry 128 MB swap space. Configured more swap space, ordered more memory. Job done. Everyone was happy. All objects could be instantiated successfully and I didn't have to set a fortnightly reminder to restart Oracle on Friday lunchtime. So, I was able to return to the 'Concepts' guide and my status was elevated from 'quiet man in corner' to 'Hero'. A couple of months later, I was asked to look into the 'appalling lack of performance scalability' of the database on UAT. Apparently, during load tests, the J2EE application could only process 800 transactions per hour. Well, they weren't actually conventional database transactions but rather complex, involved business processes. UAT had a 128 MB buffer cache and a 1 GB shared pool. Odd. The shared pool was littered with lots of almost identical SQL statements with embedded literals. I suspect the shared pool was originally 128 MB, the library cache hit ratio was low and some performance tuning wizard (human or otherwise) recommended 'Low library cache hit ratio means increase shared pool immediately'. Repeat ad infinitum. I summoned up the courage and talked to the developers. 'Would it be possible to modify the application to use bind variables ?' 'No. Listen. We just use objects. We don't make database calls'. Some more investigations. The J2EE application uses TopLink which is an interface layer translating the objects into database accesses. I decided to read the TopLink manual and suggested setting the TopLink configuration parameter 'should-bind-all-parameters' to True and repeat the test. While the development team made the changes, I reduced the shared pool to 128MB and increased the buffer cache to 1GB. Repeat test. Staggering improvement to over 3,000 'transactions' per hour. My status is immediately elevated to 'demigod'. Why, they were so grateful, a Java developer finally divulged the proxy they were all using to access the internet. I created my own private swear box for the terms 'EJB', 'J2EE', 'Container Managed Persistence', 'multiple inheritance', 'classes', 'methods', 'destructors' and 'Persistence Architect' and returned to quietly reading the fine Oracle manuals.

introduction to ETL for Oracle

There are a variety of methods to extract, transform and load (ETL) data in and out of an Oracle database. This short article provides an overview of some of the basic options that are supplied with Oracle.

L is for Load

Firstly, we need a table to load data into. This SQL script creates a table called PERSON.
create table person
(person_id number(12),
forename varchar2(200),
surname varchar2(200),
title varchar2(10),
email varchar2(100),
gender varchar2(1),
ni_number varchar(30),
address1 varchar2(100),
address2 varchar2(100),
town varchar2(100),
county varchar2(100),
postcode varchar2(10),
country varchar2(100),
phone varchar2(100),
mobile varchar2(100),
salary number(12),
date_of_birth date);
The following PL/SQL procedure may be used to populate this table with an arbitrary amount of data.
create or replace
procedure load_person(p_records in pls_integer)
is
v_person_id person.person_id%type;
begin
for i in 1 .. p_records
loop
v_person_id := to_char(i);
insert into person
(person_id,
forename,
surname,
title,
email,
gender,
ni_number,
address1,
address2,
town,
county,
postcode,
country,
phone,
mobile,
salary,
date_of_birth)
values
(v_person_id,   /* person_id */
'Norman',     /* forename */
'Brightside',    /* surname */
'Mr',    /* title */
'norman@brightside.com',  /* email */
'M',    /* gender */
'AA123456B',   /* ni_number */
'Leafy Cottage',    /* address1 */
'1 New Street',   /* address2 */
'Anytown',    /* town */
'Anycounty',    /* county */
'NN12 9ZZ',    /* postcode */
'UK',     /* country */
'0208444555',   /* phone */
'07789123321',   /* mobile */
20000,    /* salary */
'01-JAN-1970'   /* DOB */
);
end loop;
commit;
end;
Loading 1,000,000 records took 1 min 54 secs running Oracle 10g (10.2.1) on my home PC. One obvious deficiency in the PL/SQL is that all records (and the SQL statement) are identical. In practice, this will not be the case. Therefore, it may be beneficial to replace the literal SQL with bind variables using using native, dynamic SQL.
create or replace
procedure load_bind(p_records in pls_integer)
is
sql_stmt varchar2(2000);
v_person_id person.person_id%type;
v_forename person.forename%type := 'Norman';
v_surname person.surname%type := 'Brightside';
v_title person.title%type := 'Mr';
v_email person.email%type := 'norman@brightside.com';
v_gender person.gender%type := 'M';
v_ni_number person.ni_number%type := 'AA123456B';
v_address1 person.address1%type := 'Leafy Cottage';
v_address2 person.address2%type := '1 New Street';
v_town person.town%type := 'Anytown';
v_county person.county%type := 'Anycounty';
v_postcode person.postcode%type := 'NN12 9ZZ';
v_country person.country%type := 'UK';
v_phone person.phone%type := '0208444555';
v_mobile person.mobile%type := '07789123321';
v_salary person.salary%type := 20000;
v_date_of_birth person.date_of_birth%type := to_date('01-JAN-1970', 'DD-MON-YYYY');
begin
sql_stmt :=
'insert into person
(person_id,
forename,
surname,
title,
email,
gender,
ni_number,
address1,
address2,
town,
county,
postcode,
country,
phone,
mobile,
salary,
date_of_birth)
values (:v_person_id,
:v_forename,
:v_surname,
:v_title,
:v_email,
:v_gender,
:v_ni_number,
:v_address1,
:v_address2,
:v_town,
:v_county,
:v_postcode,
:v_country,
:v_phone,
:v_mobile,
:v_salary,
:v_date_of_birth)';
for i in 1 .. p_records
loop
v_person_id := to_char(i);
execute immediate sql_stmt using
v_person_id,
v_forename,
v_surname,
v_title,
v_email,
v_gender,
v_ni_number,
v_address1,
v_address2,
v_town,
v_county,
v_postcode,
v_country,
v_phone,
v_mobile,
v_salary,
v_date_of_birth;
end loop;
commit;
end;
/
show errors
The modified PL/SQL procedure took 2 mins 1 sec for 1 million records. This doesn't imply that bind variables are worse than literals. More likely, that parsing overhead is relatively small for insert statements.

E is for Extract

People often ask why Oracle do not provide an equivalent SQL*Unloader to dump tables to delimited files. Well, the standard SQL*Plus utility may be used for such a purpose. Tom Kyte created some generic wrapper scripts (for both Unix and Windows) to dump an Oracle table to a delimited text file. The 'flat' utility was used to dump the contents of the PERSON table into a comma separated value (CSV) file containing 1 million (identical) records. SQL*Loader was then used to immediately load the data back in to the PERSON table again using this control file.
LOAD DATA
TRUNCATE
INTO TABLE person
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(person_id,
forename,
surname,
title,
email,
gender,
ni_number,
address1,
address2,
town,
county,
postcode,
country,
phone,
mobile,
salary,
date_of_birth
)
The time taken was 1 min 25 secs for conventional mode and 28 seconds when specifying DIRECT=TRUE for a direct path load. Another interesting and often overlooked method for loading data is the use of external tables. This allows the creation of a virtual read-only table which is mapped directly to the host file using a CREATE TABLE syntax (similar to that for SQL*Loader control file). This SQL script creates an external table which can then be analysed using standard SQL query tools without having to actually load a database table.
create table person_ext
(person_id number(12),
forename varchar2(200),
surname varchar2(200),
title varchar2(10),
email varchar2(100),
gender varchar2(1),
ni_number varchar(30),
address1 varchar2(100),
address2 varchar2(100),
town varchar2(100),
county varchar2(100),
postcode varchar2(10),
country varchar2(100),
phone varchar2(100),
mobile varchar2(100),
salary number(12),
date_of_birth date)
ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS (FIELDS TERMINATED BY ',')
LOCATION ('person.dat')
);
It is then trivial to load data from the external table into a conventional Oracle table as follows:
SQL> insert into person select * from person_ext;
This method of data loading took 39 secs.

T is for Transform

And deliberately left until last. Extracting and loading data is relatively easy. Often the time consuming and complicated task in ETL is to identify, transform and potentially clean the data items from the source system and map them to tables in the target database.

interviewing stories #3

A few years ago, I worked for a small internet company and was a successful dot com millionaire. Then the confounded Italians went and devalued the Lira. We spent (wasted) lots of time interviewing Oracle developers. Interviewing was time consuming and we had a fairly rapid turnover of people with varying technical abilities and personality disorders. After a while, we decided to set a short technical test in an effort to save time and improve the quality of the technical people we hired. The test was a trivial piece of PL/SQL. Count all the rows in all the tables in a specified schema. The consequences of introducing of this test were very revealing. Some candidates (with years of Oracle and PL/SQL experience) flatly refused to take the test at all - 'the agent didn't say anything about a technical test'.? Some candidates asked if they could have access to our embryonic Oracle technical library (a couple of O'Reilly books). Some candidates asked it they could use their own technical books for reference. We flatly refused. It was interesting to see whether people elected for Linux/vi or Windows/notepad. It was interesting to see whether people could deliver under pressure. It was interesting to see whether people asked questions. It was interesting to see whether people actually compiled and executed the code. Some candidates thought, incorrectly, that the test was purely to 'write the code'. It was interesting to see whether people visited 'http://tahiti.oracle.com/'. We said you couldn't use books. However, we didn't say you couldn't use the internet. It was interesting to see whether people commented a trivial procedure. It was interesting to see whether people counted the rows or checked for the presence of recent statistics first. It was interesting to see whether people used native, dynamic SQL or tortured themselves with DBMS_SQL. It was interesting to see whether people favoured SQL*Plus, Pro*C, PL/SQL or Perl/DBI. The standard of candidates improved. If the technical test was unsatisfactory, we skipped the interview formalities to save time (theirs and ours). One outstanding candidate delivered an excellent test. Perfect, commented PL/SQL. He even included a file header with CVS macros for version control. The results were correct. He finished the assigned task in minutes. He even used the singular if a table had '1 row'. He was an excellent Oracle PL/SQL developer. However, we didn't hire him because there was an indefinable, intangible concern, a nagging worry that we couldn't articulate or put our finger on precisely. It just didn't feel right. Hard to explain to the agent. Even harder to explain to him but sometimes you have to go with your instincts.
Page: ← Previous  1 … 7 8 9 10 11 Next →