@import - Importing data

Only in DbVisualizer Pro

This feature is only available in the DbVisualizer Pro edition.

Instead of using the GUI to import data you can use client-side commands to import data, i.e @import. This enables you to use the DbVisualizer command-line interface to automate your imports and utilise other client-side commands such as @export, @mail, among others. Import data using the @import command supports the following formats:

Excel
xlsx or the legacy xls format
XML
The same XML formats that can be exported with DbVisualizer
JSON
The same JSON formats that can be exported with DbVisualizer
CSV
Importing CSV files supports a lot of configurations such as multi-symbol column separator, multi-line values, etc.
TXT
Importing fixed width text files

These are client-side commands for @import are:

Command	Description
`@import on`	The command starts an import session
`@import set`	Set parameters for the import
`@import parse`	Parse and convert the source data to an intermediary format, and analyze the data.
`@import target`	Identify the target table and what target columns should be used
`@import execute`	Runs the export

The import process is explained by the following figure.

The @import parse step (1) lays the foundation for the database import as it based on the source file format (csv, json, xlsx, etc), parses the data to an intermediary and internal file. This file is then analyzed to detect widths, data types and their sizes based on the data, row and cell references back to the source file, and a lot more.

The intermediary format stores the input data as Records associated with information from where the data originates.

Once the data has been parsed, some basic tests are performed to see if the properties of the data are compatible with the target table properties. E.g. if the size of the input data fits the targeted table columns. This is done when the @import target command is run (2).

The final step of an import is to execute (3-4) the actual operations towards the database to import the data. I.e. execute the INSERT statements. Any failures in the import will include specification of where in the source file the invalid data originates.

Following is a complete example where a simple CSV file is imported to a Target table Expenses.

Importing data - Example

The example shows how data about fruits included in a CSV file fruits.csv is imported into a database table fruits.

The example shows a minimal example where the columns of the CSV file are mapped directly to the table columns.

1,Banana,1.22
2,Apple,0.35
3,Orange,0.55

The SQL needed to import the file

@import on;
@import set ImportSource="fruits.csv";
@import parse;
@import target Table="FRUITS";
@import execute;
@import off;

Running the import script in DbVisualizer

As seen in the screenshot above, 3 records were inserted, and 1 was skipped (the header).

Read further on in the guide for how you can tailor your import using the different parameters of the import commands.

The @import commands

@import on

The @import on command initializes the import session. When a client-side command import session is started, an output folder is created where DbVisualizer generates data representing intermediate results of the import. The output folder is created under a root folder (importresults) in the DbVisualizer's current workspace. (By default, this is the .dbvis folder in the user's home folder). The name of the folder is generated to be unique.

These folders are automatically cleaned by DbVisualizer regularly.

Note: Normally there is no need to specify any parameters to the @import on command. Default values should be enough for most uses.

Parameter	Default	Valid Values
ImportResultRoot	<WORKSPACE>/importresults	The root folder for parsing results and other temporary files produced by the import session. Note: Any results produced in this directory will automatically be removed on regular basis.
ImportResultDir	Automatically generated as a sub folder to `WORKSPACE`/importresults	The path to put the DbVisualizer import results. If the path is an absolute path the results are stored in this folder. If it's not an absolute path it is assumed that it's path relative to the `ImportResultRoot`. The folder is created when import is started. If the folder already exists, the import fails unless the Clean parameter is also used. If none of the `ImportResultRoot` and `ImportResultDir` is specified DbVisualizer will create a new uniquely named directory for every import session.
Clean	false	If true, the `ImportResultDir` is cleaned before import. If false import fails if directory exists. No need of specifying this parameter if `ImportResultDir` is not specified.
ContinueImportFile		Specified if you are continuing an import. See chapter about Continuing an Import.

Example 1

@import on ImportResultDir=/tmp/myimport Clean="true";

Import results are stored in the folder /tmp/myimport. Any existing folder /tmp/myimport will be deleted/cleaned.

Example 2

@import on ImportResultDir="newimport";

Import results are stored in <WORKSPACE/importresults/newimport. If this directory exists, the import will fail.

@import set

The @import set command takes a parameter name followed by an equal sign and a value, e.g. parameter="value". You can use the following parameters, where ImportSource is the only required parameter. All names are case-insensitive. Note that you may use multiple @import set commands but the first one must include the parameter ImportSource setting the input data to import.

Generic parameters for all import formats

These parameters are valid for all import formats.

Parameter	Default	Valid Values
BooleanFalseFormats	false, no, 0, off	Comma-separated string of values that should be interpreted as boolean false.
BooleanTrueFormats	true, yes, 1, on	Comma-separated string of values that should be interpreted as boolean true.
DateFormat	yyyy-MM-dd	For valid formats, see Changing the Data Display Format.
DecimalSeparator	.	The decimal separator to use when parsing decimal numbers.
Encoding	As set in Tool Properties	The file encoding to use when parsing the file to import.
ErrorIncludeStackTrace	false	true false If true the Java stack trace of any exception will be included in error messages.
FailOnConvertFailure	false	true false If true, the import will fail if data conversion fails.
FailOnNoColumnsFoundFailure	true	true false If true, the import will fail if the target table appears to have no columns.
FailOnParseFailure	false	true false If true, the import will fail if we got a failure during parsing of the source data.
Format		CSV XLS XLSX XLSM JSON TXT XML Specify the format of the file to import. Normally the format is auto detected which means that this parameter is optional.
HeaderStartRow	0	The row index of the row where the header starts. If set to a number n the `StartRowOfData` parameter is automatically set to n + 1. The default value 0 indicates that the source data has no header information.
ImportSource		A path to the file to import. Must be included in the first `@import set` command. The path is an absolute path or a relative path to the script location. If a `@cd` command has been run before the `@import set` command a relative path is relative to the `@cd` directory.
MaxRows	-1	The Maximum row to parse/import.
ShowNullAs		The value that should be considered as NULL. E.g. (null)
SkipEmptyRows	true	true false
SkipHeader	true	true false If set to false the header is also imported.
SkipRowsStartingWith		String
StartRowOfData	1	The row index of the row where data starts. See also `HeaderStartRow`.
ThousandSeparator	,	The thousand separator to use.
TimeFormat	HH:mm:ss	For valid formats, see Changing the Data Display Format.
TimeStampFormat	yyyy-MM-dd HH:mm:ss.SSSSS	For valid formats, see Changing the Data Display Format.

CSV parameters

Parameter	Default	Valid Values
CsvColumnDelimiter		Can be used to override the auto detection by specifying a column delimiter. `CsvColumnDelimiter=";"` The most common column delimiters are auto detected.
CsvColumnDelimiterType	Auto Detect	Valid values are: String or Auto Detect. If only `CsvColumnDelimiter` is specified in the command `CsvColumnDelimiterType` is set to String
CsvTextQuotedBetween	None	Having data quoted is needed if the data itself contains the delimiter use to separate columns, special characters such as tabs, multiline separators, etc. The `CsvTextQuotedBetween` parameter is used to specify the identifier which is used to enclose this type of data. Please note that there may not be multiple text quote identifiers in the same import source. Any character is valid to use. These alaises can be used to make the delimiters readable: - Single same as specifying the value as `"'"` - Double same as specifying the value as `""""`. (Note that a `"` character need to be escaped with an additional `"`). - None no quote character (default) Example 1: `CsvTextQuotedBetween="Single"` Ex: `1 'Apple' 'Round fruit'` Example 2: `CsvTextQuotedBetween="Double"` Ex: `2 "Lemon" "Yellow fruit"` Example 3: `CsvTextQuotedBetween="\|"` Ex: `2 \|Lemon\| \|Yellow fruit\|` If you have quoted data in your import source and do not specify `CsvTextQuotedBetween`, the data will be handled as text data.

Excel parameters

Parameter	Default	Valid Values
ExcelCellPolicyError	SKIP	Defines what to do for a formula cell where the cached value indicates an error. Possible values: EMPTY Set the cell to blank. SKIP_ROW Skip the complete row. ERROR Produce an error for the cell/row. TO_STRING Include the error as data.
ExcelSheetId	0	An id specifying the sheet id in a workbook. First sheet has id = 0 Either this parameter or `ExcelSheetName` parameter can be used, not both.
ExcelSheetName		A name specifying the sheet name in a workbook.

TXT parameters

Parameter	Default	Valid Values
TxtColumns		Used when fixed columns text files are imported. Example 0, 4. For detailed syntax of the `TxtTrim` parameter please see the example Importing fixed column width input data
TxtTrim	true	If true, the data will be trimmed.

Example 1

@import set StartRowOfData="5" SkipRowsStartingWith="//";

We are starting to import data from row 5 of the source data file and skipping rows starting with "//".

Example 2

@import set HeaderStartRow="1";

The input data has header information at row 1. We are starting to import data from row 2. As StartRowOfData is not explicitly set is automatically set to 2.

@import parse

This command does all the parsing and analysing of input data.

Example

@import parse;

The source data file is parsed. The esult is stored in the folder defined by ImportResultDir.

Example generated by the @input parse command:

Parsed 5 records. Columns in source: 2 using ',' delimiter

INDEX NAME     TYPE       NULLABLE FROM ROW 
----- -------- ---------- -------- -------- 
0     NAME     String(6)  No       2        
1     BIRTHDAY String(16) No       4        

Total bytes: 73 B

The information shows the number of parsed records along with the number of columns found. If the parsed file was a CSV file the used delimiter is printed.

For each column the following information is printed:

INDEX
The index of the column
NAME
If @import set parameter HeaderStartRow was specified and header information was extracted the extracted column name is printed.
TYPE
The type of data found. The size declaration (E.g. 16) represents the longest string found.
NULLABLE
If the column is nullable or not.
FROM_ROW
From which row in the source file the data type (e.g. String) was determined. This number serves as a hint to investigate source data when an unexpected type is analyzed. E.g. The column name BIRTHDAY in the source data indicates that this data should be a date. By investigating the source data at row 4 you may find the reason why the column was analyzed as String column.

@import target

This command is responsible of all preparation of the target table prior to import. This includes dropping, truncating, deleting from and creating the table.

When this is executed a check is done if the input data will actually fit the table. This is done by comparing of the analyze result with the specified target and column mapping. Depending on the parameters the check is performed at different occasions. Parameters for this command are:

Parameter	Default	Valid Values
Catalog		The target table Catalog
CleanData		Specifies if data should be cleared before import. Drop The table is dropped before import. If this is used then one of `CreateTableSQL` or `CreateTableSQLFile` must be specified. Clear The table is emptied before import. Either through the use of the SQL, TRUNCATE or DELETE. Default method is TRUNCATE. Override the default Clear method by specifying the parameter `ClearTableMethod`. Before the table is cleared the check of input data towards the target table is performed.
ClearTableMethod		Specifies how the table data should be emptied prior to importing the ne data: Truncate Delete
ColumnMapping		Specifies mapping of source columns to target columns. The default is to import the source columns to the target table column by index. First column in source is imported in the first column in the table. To specify another order or to ignore certain columns use the `ColumnMapping` parameter.Syntax: `ColumnMapping="<source column>=<target column>, <source column>=<target column>, ...`. Source column can be identified by index starting with 0 or by its column name. Target column can be identified by index starting with 1 or by its table column name. Example: `ColumnMapping="0=2, 1=3"` or `ColumnMapping="id=no, color=col"`. The mapping also supports overriding of the data type information of the input data column. I.e. overriding the type information deducted by DbVisualizer when parsing/analysing the data. The syntax in this case, for a single column mapping is: `<sourde column>(<data type>)=<target column>` Where `<data type>` is one of: String, Date, Time, Timestamp, Number, Decimal Number, Boolean, BLOB and, CLOB** Example: `ColumnMapping="id(Number)=no, color=col"`
ColumnMappingFile		A reference to a file containing the column mappings. Same syntax as for the `ColumnMapping` parameter with the addition that column entries may be specified in separate rows. Single line comments (using `--`or `//`) and block comments (using `/.../`) are also supported.
CreateTableSQL		The SQL needed to create the table to import to. This parameter or the `CreateTableSQLFile` parameter is required if `CleanData="Drop"` is specified.
CreateTableSQLFile		A file reference to a file containing the SQL to create the table.
DropTableSQL		The SQL used to removing (drop) the table.
FailOnDropFailure	false	If true, the import will fail if the DROP table statement fails.
Schema		The target table schema to import to.
SkipValidateColumns		A comma-separated list of target column names for which validation shall not be done. Example: `SkipValidateColumns="Column1,Column2"`
SkipValidateColumnNumbers		A comma-separated list of target column numbers for which validation shall not be done. Example: `SkipValidateColumnNumbers="1,5"`
SkipValidateJdbcTypes		A comma-separated list of JDBC type names for which validation shall not be done. Example: `SkipValidateJdbcTypes="INTEGER,TINYINT,VARCHAR"`
Table		The target table to import to.
UseDelimitedIdentifiers	false	true false If true then table and column names will be delimited.

Some examples

Example 1

@import target Table="MyTable" ColumnMapping="0=ID,2=NAME" CleanData="Drop" 
               CreateTableSQL="CREATE TABLE MyTable (id SMALLINT, name VARCHAR(45))";

The target table is the MyTable table. The table is dropped and recreated before import. The first column of the input data is mapped to the target column ID and the third column to the target column NAME".

As the table is dropped, we need to supply the DDL/SQL to create the table.

Example 2

@import target Table="MyTable" ColumnMapping="2=NAME" CleanData="Clear"

The table is cleared before the import. The default clear method is determined by DbVisualizer. For databases supporting this, Truncate is used.

@import execute

This command will run the import.

Parameter	Default	Valid Values
BatchImport	true	true false Setting this to true significantly improve the import performance. Note that batch import may not be supported by all databases or JDBC drivers. In error situations it is also a good idea to switch off batch import.
BatchSize	100	For every 100 (or specified) number of rows being inserted, DbVisualizer will run a database commit.
FailOnInsertFailure	false	true false If true, the import will fail if an INSERT statement fails.
Example:

Run the import in a non batch mode

@import execute BatchImport="false";

Examples

Selecting data to import and mapping columns

CSV File delimited by exclamation mark "!".

Volvo!XC90
BMW!F32 4 Series
Volvo!XC60
Mercedes!C197 SLS AMG

Note that the first column of the CSV file is the brand name ("Volvo") of the car. The table we are importing to have the columns in opposite order model, brand. I.e. there is a need to map the columns.

CREATE TABLE carmodel (model VARCHAR(50), brand VARCHAR(50));

@import on;
@import set ImportSource="cars.csv" CsvColumnDelimiter="!" 
            SkipRowsStartingWith="BMW"; 
@import parse;
@import target Table="carmodel" ColumnMapping="0=brand,1=model";
@import execute;

Parameters used

CsvColumnDelimiter="!" specifies that the data is delimited by an exclamation mark ("!").
SkipRowsStartingWith="BMW" do not import entires starting with "BMW".
ColumnMapping="0=brand,1=model column 0 of the CSV file is mapped to the brand column of the database table. Column 1 is mapped to the model column.

The table carmodel content after import:

model        brand    
------------ -------- 
XC90         Volvo    
XC60         Volvo    
C197 SLS AMG Mercedes

Overriding analyzed type information

When an input file is parsed, DbVisualizer analyzes the data to determine the data types of the input data. The algorithm for this is quite coarse, and it is not possible to override the analyzed data type.

CSV data

NAME,BIRTHDAY
August, "Sat, 21 Jul 1962"
Sven, "Fri, 21 Jan 1972"
Lotta, "NoData"
Bert, "Sat, 21 Jul 1962"

Note that the birthday of Lotta is NoData which is of course not a valid date. When the data is analyzed, it will come to the conclusion that the BIRTHDAY column is a String.

The result of @import parse will contain a table describing information about the data that was parsed.

Parsed 5 records. Columns in source: 2 using ',' delimiter

INDEX NAME     TYPE       NULLABLE FROM ROW 
----- -------- ---------- -------- -------- 
0     NAME     String(6)  No       2        
1     BIRTHDAY String(16) No       4        

Total bytes: 73 B

As mentioned earlier you can see that column BIRTHDAY has been interpreted as a String. This was found examining row 4 (FROM ROW column is 4).

In connection with inserting this column in the database DbVisualizer would insert/set this as a string. This would result in total import failure and no rows would be inserted in the database.

This may be addressed by overriding the analysed type for BIRTHDAY (String) and set the type to date.

The script:

CREATE TABLE birthdays (name VARCHAR(40), birthday DATE);

@import on;
@import set ImportSource="birthdays.csv" CsvTextQuotedBetween="Double" DateFormat="EEE, d MMM yyyy" HeaderStartRow="1"; 
@import parse;
@import target Table="birthdays" ColumnMapping="0=name,1(date)=birthday";
@import execute;
@import off;

Parameters

DateFormat="EEE, d MMM yyyy"
Defining the format to be able to interpret the dates in the CSV file.
ColumnMapping="0=name,1(date)=birthday"
Note the 1(date)=birthday where we are mapping the column with index 1 to the target column birthday. The (date) part specifies that column 1 should be interpreted as a date.

The result of the import using the script above is that 3 rows are imported (August, Sven and Bert). The row representing Lotta is reported as a failure as Indicated below.

1 Record affected, Record: 0 originating at row: 4
DataRecordException: Convert error, Column: BIRTHDAY at index: 1
DataTypeConversionException: Value is 'NoData'. Not a valid date format. Valid format: 'EEE, d MMM yyyy'

Importing fixed column width input data (TxtColumns parameter)

Text File

001   APPLE
002   SLEMON
003   ORANGE

The SQL Script

@import on;
@import set ImportSource="fruitlist.txt" TxtColumns="0,6"; 
@import parse;
@import target Table="fruitslist" Catalog="test";
@import execute;

The parameter TxtColumns parameter specifies the column character positions. In this case first column starts at character position 0 and the second column starts at character position 6.

The resulting imported table is

id name   
-- ------ 
1  APPLE  
2  LEMON  
3  ORANGE

Note how "LEMON" is imported without proceeding blanks. This is because column values are trimmed (TxtTrim parameter default is true).

The TxtColumns parameter

The TxtColumns parameter supports a number of syntaxes as explained in the examples below.

An example when parsing a row "AAA BBB CCC"

TxtColums parameter	Yields extracted columns
0-2, 4-5	"AAA" "BB"
1-3, 8-9	"AA" "C"

Omitting the end index (as in the SQL script above)

TxtColums parameter	Yields extracted columns
0, 4, 8 ( same as 0-3,4-7, 8-end of line)	"AAA" "BBB" "CCC"
1, 8-9 (same as 1-7,8-9)	"AA BBB" "C"

Using the "+" sign

TxtColums parameter	Yields extracted columns
0+3, 4+3, 8 (Same as 0-3,4-7, 8-end of line)	"AAA" "BBB" "CCC"
0+7, 8+1 (Same as 0-7, 8-9)	"AAA BBB" "CC"

Importing Excel data

Excel file

A   B
--  --------
1   A
2   #DIV/0!
3   C

Note how row 2 column B has the value #DIV/0!. This value represents a case where a cell is a calculated using formula where the calculation is producing an error.

SQL Script to import

CREATE TABLE exceldata (id INT, value VARCHAR(50));

@import on;
@import set ImportSource="excelData.xlsx" ExcelSheetName="mydata" ExcelCellPolicyError="SKIP_ROW"; 
@import parse;
@import target Table="exceldata";
@import execute;
@import off;

Parameters used

ExcelSheetName="mydata"
Specifying that the sheet named mydata is is the sheet to import.
ExcelCellPolicyError="SKIP\_ROW" Specifies that if we find a formula that produced an error we should skip the complete row.

Resulting Table

id value 
-- ----- 
1  A          
3  C

Note that row 2 is not imported as we instructed by ExcelCellPolicyError="SKIP\_ROW".

Continuing an export that has failed

If any of the input data cannot be imported DbVisualizer will keep track of this. This is done by storing the failed data in a specific errorRecords.drec file in the directory where the import process stores its intermediate results (See ImportResultRoot parameter).

Following is an example were the export fails. It also shows how to import the failed data again. Specifically, the first import fails as Clementine is a name that is too long to fit in the target table column.

ID,FRUIT,PRICE
1,Banana,1.22
2,Clementine,0.35
3,Orange,0.55

First Import SQL Script

CREATE TABLE fruits (id SMALLINT, name VARCHAR(6), price DECIMAL(10,2)); 

@import on  Clean="true" ImportResultDir="/tmp/importContinueFirstImport";
@import set ImportSource="fruits.csv" HeaderStartRow="1";
@import parse;
@import target Table="fruits" SkipValidateJdbcTypes="VARCHAR";
@import execute;
@import off;

Note: the table definition for the fruit table defines the column name to be VARCHAR(6). The input data "Clementine" will not fit there.

Parameters used

ImportResultDir="/tmp/importContinueFirstImport"
We get our results in this directory. Makes it easy to refer in the second import script.
SkipValidateJdbcTypes="VARCHAR"
Tells DbVisualizer not to validate VARCHAR columns. This is done for the purpose of this example. If not specified, the import would stop before any data has been imported.

When running this import the Database used in the example (MySQL) will fail when the import tries to insert the data row 3 as Clementine will not fit the column. The Failure printed in the DbVisualizer Log for this looks something like:

1 Record affected, Record: id = 0 originating at row: 3
DataRecordException: Error when importing data. 
MysqlDataTruncation: Data truncation: Data too long for column 'name' at row 1

Note that the source data is pinpointed as originating at row: 3.

The Table fruits content after import.

id name   price 
-- ------ ----- 
1  Banana 1.22  
3  Orange 0.55

Second import SQL Script

ALTER TABLE fruits MODIFY COLUMN name VARCHAR(20);

@import on ImportResultDir="/tmp/importContinueSecond" UseImportFile="/tmp/importContinueFirstImport/errorRecords.drec" Clean="true";
@import execute;
@import off;

The ALTER statement is dealing with the root cause of why the import failed. The name column was too small.

Note that when continuing an import, the commands @import set and @import target is not specified. The settings and target from the old import is used.

Parameters used

ImportResultDir="/tmp/importContinueSecond"
Specifying a separate directory for the results
UseImportFile="/tmp/importContinueFirstImport/Results/errorRecords.drec"
Pinpointing the file containing the data that contains the data that could not be imported.

The Table fruits content after second import.

id name       price 
-- ---------- ----- 
1  Banana     1.22  
3  Orange     0.55  
2  Clementine 0.35

Testing an import - Dry Run

The client-side import offers a way to run the import script to perform all client side data validation without changing anything in the database. This is done using the @set dryrun command.

Note that when running the import, without dry run, the import may fail nevertheless due to checks on the database side. E.g. primary- or unique key constraint checks.

@import set ImportSource="fruits.csv";
@import parse;
@set dryrun;
@import target Table="fruits" CleanData="Clear";
@import execute;
@set dryrun off;
@import off;

When running the script, no actual clearing of the table will be done as the parameter CleanData indicates. Nor does the @import execute lead to any rows being inserted in the database.

Since there is a @set dryrun command prior to the commands no changes to the database table will be performed.

<- Previous@mail - Send emails and attach files Next ->Parameterized SQL in DbVisualizer: Variables and Markers