CLOBs
Not many tables demand the use of CLOBs (Character Large Objects). CLOB data type columns can store up to 4 GB of text.
When designing a new application, developers face a dilemma – I need a column where I need to store a large block of text, should I use multiple VARCHAR2 columns or CLOB? The decision to use VARCHAR2 instead of CLOB is usually driven by the fear of unknown. One should always use CLOB when it comes to making a decision about multiple VARCHAR2 columns vs. CLOB to store a single logical piece of textual data.
Reasons to use CLOB
- “Ridicule me for life” scenario – The author who created an Oracle table with VARCHAR2 columns such as SQL_TEXT1, SQL_TEXT2, SQL_TEXT3 to overcome lengh limitations has opened himself/.herself to ridicule by peers for years to come
- There are still limitations – Since VARCHAR2 columns are limited to 4000 characters max, it still does not solve the problem for extremely large texts
- Tedious to use – Every time a text is needed as a whole, the multiple VARCHAR2 columns have to be concatenated together. Even PL/SQL VARCHAR2 data type only supports up to 32 Kb.
- Hard to maintain – The moment someone realized that SQL_TEXT4 is needed, all the code and query references have to be updated. Single CLOB column is really simple.
- Oracle internally treats inline CLOB values less than 4000 characters as VARCHAR2
- Performance – Indexing and appropriately using multiple VARCHAR2 columns is difficult at best and impossible at worst. Indexing CLOBs are slightly different from regular columns though.
When to use what datatype?
Oracle, for its part, has slowly migrated away from LONG data type to CLOB on its system tables. Choose between these data types based on your need for your applications
Data Type | Description |
CLOB | Use this to store large ANSI text |
NCLOB | Same as CLOB except that characters are stored in multi-byte character set. Uses twice the space as CLOB to store the same thing when used inappropriately. |
BLOB | Binary large objects – like images, videos or anything that is binary and large in nature. |
XMLType | Use this to store XML data. Oracle has rich support for querying and updating XML type columns using SQL. |
VARCHAR2 | It is well known that the column will NEVER ever need to store more than 4000 characters max |
NVARCHAR2 | Similar to CLOB vs. NCLOB difference above |
BFILE | Data is stored in a binary file on the operating system file system instead of inside the Oracle database files |
Once the data is stored in a CLOB column for example, querying and using the data may require that the CLOB be converted to VARCHAR2 to make it more human-readable or when we need to using functions that are available only for the VARCHAR2 data type.
CLOB output (unreadable)
Normally, when querying using commonly available tools, CLOBs would look like this:
SELECT * FROM dba_hist_sqltext
DBID | SQL_ID | SQL_TEXT | COMMAND_TYPE |
314195459 | 0rc4km05kgzb9 | (HUGECLOB) | 3 |
314195459 | 39m4sx9k63ba2 | (HUGECLOB) | 3 |
314195459 | d3g73kux34mca | (HUGECLOB) | 3 |
CLOB output (readable using DBMS_LOB)
To convert them to VARCHAR2, use the DBMS_LOB package:
SELECT a.*, dbms_lob.substr( sql_text, 4000, 1) sql_text_vc FROM dba_hist_sqltext a
DBID | SQL_ID | SQL_TEXT | COMMAND_TYPE | SQL_TEXT_VC |
314195459 | 0rc4km05kgzb9 | (HUGECLOB) | 3 | select 1 from obj$ where name=’DBA_QUEUE_SCHEDULES’ |
314195459 | 39m4sx9k63ba2 | (HUGECLOB) | 3 | select /*+ index(idl_ub2$ i_idl_ub21) +*/ piece#,length,piece from idl_ub2$ where obj#=:1 and part=:2 and version=:3 order by piece# |
314195459 | d3g73kux34mca | (HUGECLOB) | 3 | with temp_kk as (select code fk_business_unit, description processing_group from gotm_lookup_ext gle |
SUBSTR works but stick with DBMS_LOB
Even a regular SUBSTR works fine with CLOB’s. It does need to be converted with TO_CHAR to be human-readable as text. The following will produce the same result as above although using the DBMS_LOB supplied functions is the recommended and safe way to work with LOB’s.
SELECT a.*, TO_CHAR(SUBSTR( sql_text, 1, 4000)) sql_text_vc FROM dba_hist_sqltext a
In the above example, only the first 4000 characters are fetched. However, it is easy to alter the third parameter, the starting position and fetch other parts of the text.
In fact, the DBMS_LOB package offers a lot of functionality to work with large object data types. Please read about it here. Additional information on working with LOB’s is also available at Oracle Docs.
But CLOB and VARCHAR2 can also store multi-byte characters. It depends on database character set NLS_CHARACTERSET. Corresponding setting for NCLOB and NVARCHAR2 is NLS_NCHAR_CHARACTERSET.
Thanks for the clarification and feedback. It is very confusing and I don’t certainly claim to be an expert!
So in summary, the database character set defined at creation determines the encoding to be used in the SQL CHAR datatypes. If the defined database character set (which can be changed but one should not as it leads to corruption) is not the desired one or is an after-thought, the SQL NCHAR datatypes should be used.
This is a good reference:
https://docs.oracle.com/cd/B19306_01/server.102/b14225/ch6unicode.htm
Right, this is good reference although about database 10g. Already at that time was possible to set AL32UTF8 (multi-byte) as character set of database which supports most of languages. There is no need to change it. Except for Chinese, Japanese – then NCHAR, NCLOB is useful.
If I use a CLOB instead of several VARCHAR2 fields, do I reference the field in a standard SQL the same way? For Example: if the DESCRIPTION field is defined as a CLOB, will the following work?
select course_id,description,units from CLASSES;
Your SQL will work but “work” means different things as the client software (.NET/JAVA/etc) has to be ready to handle the CLOB data type. If you want the column to be treated just like VARCHAR you have to do the conversion explicitly with DBMS_LOB as shown in the post above.
How would you visualize a CLOB file that has more than 4000 Characters. We have some comment fields in which we’re storing data that’s over 37k Characters. If I’d like to see and read them. dbms_lob.substr( sql_text, 4000, 1) sql_text_vc will limit to 4000. What’s the solution for this?
Hi Daniel. Yes, you are correct. DBMS_LOB.SUBSTR can only get 4000 characters at a time. Within PL/SQL you can access 32k in a single shot but outside, in regular SQL, you have to deal with it 4k at a time. If you are working with your own client applications using .NET or Java, you can access them all.
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:367980988799
Cool article. Did you ever tried PostgreSQL as an Oracle replacement. It has a stellar text support with everything you will ever need. Just use a varchar column and you can store up to 1 GB of string in it, while all string functions are working without limitation.
Well it’s 2020!
Thank you Gweny. I have heard a lot of good things about PostgreSQL but have not had the chance to work with it! Good to know about its VARCHAR abilities.
I would never consider multiple columns. It’s a basic rule or normalization.
I was looking at text formatted in a wysiwyg editor, so there would be multiple lines. I was considering creating a table to store the lines with a pk something like clob_id, line_number. Code would be resilient since all that happens as the clob gets bigger is more rows in that table. The owning table would store the clob_id.
You would have to write code differently. Finding a record in the owning table that contains certain text would use a where exists or, if too many rows for exists, there are other ways to make that happen efficiently.