@import - Importing data
Only in DbVisualizer Pro
This feature is only available in the DbVisualizer Pro edition.
Instead of using the GUI to import data you can use client-side commands to import data, i.e @import. This enables you to use the DbVisualizer command-line interface to automate your imports and utilise other client-side commands such as @export, @mail, among others. Import data using the @import command supports the following formats:
- Excel
xlsx or the legacy xls format - XML
The same XML formats that can be exported with DbVisualizer - JSON
The same JSON formats that can be exported with DbVisualizer - CSV
Importing CSV files supports a lot of configurations such as multi-symbol column separator, multi-line values, etc. - TXT
Importing fixed width text files
These are client-side commands for @import are:
Command | Description |
---|---|
@import on | The command starts an import session |
@import set | Set parameters for the import |
@import parse | Parse and convert the source data to an intermediary format, and analyze the data. |
@import target | Identify the target table and what target columns should be used |
@import execute | Runs the export |
The import process is explained by the following figure.
The @import parse
step (1) lays the foundation for the database import as it based on the source file format (csv, json, xlsx, etc), parses the data to an intermediary and internal file. This file is then analyzed to detect widths, data types and their sizes based on the data, row and cell references back to the source file, and a lot more.
The intermediary format stores the input data as Records associated with information from where the data originates.
Once the data has been parsed, some basic tests are performed to see if the properties of the data are compatible with the target table properties. E.g. if the size of the input data fits the targeted table columns. This is done when the @import target
command is run (2).
The final step of an import is to execute (3-4) the actual operations towards the database to import the data. I.e. execute the INSERT statements. Any failures in the import will include specification of where in the source file the invalid data originates.
Following is a complete example where a simple CSV file is imported to a Target table Expenses.
Importing data - Example
The example shows how data about fruits included in a CSV file fruits.csv is imported into a database table fruits.
The example shows a minimal example where the columns of the CSV file are mapped directly to the table columns.
1,Banana,1.22
2,Apple,0.35
3,Orange,0.55
The SQL needed to import the file
@import on;
@import set ImportSource="fruits.csv";
@import parse;
@import target Table="FRUITS";
@import execute;
@import off;
Running the import script in DbVisualizer
As seen in the screenshot above, 3 records were inserted, and 1 was skipped (the header).
Read further on in the guide for how you can tailor your import using the different parameters of the import commands.
The @import commands
@import on
The @import on
command initializes the import session. When a client-side command import session is started, an output folder is created where DbVisualizer generates data representing intermediate results of the import. The output folder is created under a root folder (importresults) in the DbVisualizer's current workspace. (By default, this is the .dbvis folder in the user's home folder). The name of the folder is generated to be unique.
These folders are automatically cleaned by DbVisualizer regularly.
Note: Normally there is no need to specify any parameters to the @import on
command. Default values should be enough for most uses.
Parameter | Default | Valid Values |
---|---|---|
ImportResultRoot | <WORKSPACE>/importresults | The root folder for parsing results and other temporary files produced by the import session. Note: Any results produced in this directory will automatically be removed on regular basis. |
ImportResultDir | Automatically generated as a sub folder to WORKSPACE /importresults | The path to put the DbVisualizer import results. If the path is an absolute path the results are stored in this folder. If it's not an absolute path it is assumed that it's path relative to the ImportResultRoot . The folder is created when import is started. If the folder already exists, the import fails unless the Clean parameter is also used. If none of the ImportResultRoot and ImportResultDir is specified DbVisualizer will create a new uniquely named directory for every import session. |
Clean | false | If true, the ImportResultDir is cleaned before import. If false import fails if directory exists. No need of specifying this parameter if ImportResultDir is not specified. |
ContinueImportFile | Specified if you are continuing an import. See chapter about Continuing an Import. |
Example 1
@import on ImportResultDir=/tmp/myimport Clean="true";
Import results are stored in the folder /tmp/myimport. Any existing folder /tmp/myimport will be deleted/cleaned.
Example 2
@import on ImportResultDir="newimport";
Import results are stored in <WORKSPACE/importresults/newimport. If this directory exists, the import will fail.
@import set
The @import set
command takes a parameter name followed by an equal sign and a value, e.g. parameter="value"
. You can use the following parameters, where ImportSource
is the only required parameter. All names are case-insensitive. Note that you may use multiple @import set
commands but the first one must include the parameter ImportSource
setting the input data to import.
Generic parameters for all import formats
These parameters are valid for all import formats.
Parameter | Default | Valid Values |
---|---|---|
BooleanFalseFormats | false, no, 0, off | Comma separated string of values that should be interpreted as boolean false. |
BooleanTrueFormats | true, yes, 1, on | Comma separated string of values that should be interpreted as boolean true |
DateFormat | yyyy-MM-dd | See valid formats in Changing the Data Display Format document. |
DecimalSeparator | . | The decimal separator to use when parsing decimal numbers |
Encoding | As set in Tool Properties | The file encoding to use when parsing the file to import |
ErrorIncludeStackTrace | false | If true the Java stack trace of any exception will be included in error messages. |
FailOnConvertFailure | false | If true, the Import will fail if data conversion fails. |
FailOnNoColumnsFoundFailure | false | If true, the Import will fail if we found no columns during continue import |
FailOnParseFailure | false | If true, the Import will fail if we got a failure during parsing of the source data. |
Format | Specify the format of the file to import. Normally the format is auto detected which means that this parameter is optional. | |
HeaderStartRow | 0 | The row index of the row where the header starts. If set to a number n the StartRowOfData parameter is automatically set to n + 1. The default value 0 indicates that the source data has no header information. |
ImportSource | A path to the file to import. Must be included in the first @import set command. The path is an absolute path or a relative path to the script location. If a @cd command has been run before the @import set command a relative path is relative to the @cd directory | |
MaxRows | -1 | The Maximum row to parse/import |
ShowNullAs | The value that should be considered as NULL. E.g. (null) | |
SkipEmptyRows | true | |
SkipHeader | true | If set to false the header is also imported. |
SkipRowsStartingWith | String | |
StartRowOfData | 1 | The row index of the row where data starts. See also HeaderStartRow . |
ThousandSeparator | , | The thousand separator to use. |
TimeFormat | HH:mm:ss | See valid formats in Changing the Data Display Format document. |
TimeStampFormat | yyyy-MM-dd HH:mm:ss.SSSSS | See valid formats in Changing the Data Display Format document. |
CSV parameters
Parameter | Default | Valid Values |
---|---|---|
CsvColumnDelimiter | Can be used to override the auto detection by specifying a column delimiter. CsvColumnDelimiter=";" The most common column delimiters are auto detected. | |
CsvColumnDelimiterType | Auto Detect | Valid values are: String or Auto Detect. If only CsvColumnDelimiter is specified in the command CsvColumnDelimiterType is set to String |
CsvTextQuotedBetween | None | Having data quoted is needed if the data itself contains the delimiter use to separate columns, special characters such as tabs, multiline separators, etc. The CsvTextQuotedBetween parameter is used to specify the identifier which is used to enclose this type of data. Please note that there may not be multiple text quote identifiers in the same import source. Any character is valid to use. These alaises can be used to make the delimiters readable:- Single same as specifying the value as "'" - Double same as specifying the value as """" .(Note that a " character need to be escaped with an additional " ).- None no quote character (default) Example 1: CsvTextQuotedBetween="Single" Ex: 1 'Apple' 'Round fruit' Example 2: CsvTextQuotedBetween="Double" Ex: 2 "Lemon" "Yellow fruit" Example 3: CsvTextQuotedBetween="|" Ex: 2 |Lemon| |Yellow fruit| If you have quoted data in your import source and do not specify CsvTextQuotedBetween , the data will be handled as text data. |
Excel parameters
Parameter | Default | Valid Values |
---|---|---|
ExcelCellPolicyError | SKIP | Defines what to do for a formula cell where the cached value indicates an error. Possible values:
|
ExcelSheetId | 0 | An id specifying the sheet id in a workbook. First sheet has id = 0 Either this parameter or ExcelSheetName parameter can be used, not both. |
ExcelSheetName | A name specifying the sheet name in a workbook. |
TXT parameters
Parameter | Default | Valid Values |
---|---|---|
TxtColumns | Used when fixed columns text files are imported. Example 0, 4. For detailed syntax of the TxtTrim parameter please see the example Importing fixed column width input data | |
TxtTrim | true | If true, the data will be trimmed. |
Example 1
@import set StartRowOfData="5" SkipRowsStartingWith="//";
We are starting to import data from row 5 of the source data file and skipping rows starting with "//".
Example 2
@import set HeaderStartRow="1";
The input data has header information at row 1. We are starting to import data from row 2. As StartRowOfData
is not explicitly set is automatically set to 2.
@import parse
This command does all the parsing and analysing of input data.
Example
@import parse;
The source data file is parsed. The esult is stored in the folder defined by ImportResultDir
.
Example generated by the @input parse
command:
Parsed 5 records. Columns in source: 2 using ',' delimiter
INDEX NAME TYPE NULLABLE FROM ROW
----- -------- ---------- -------- --------
0 NAME String(6) No 2
1 BIRTHDAY String(16) No 4
Total bytes: 73 B
The information shows the number of parsed records along with the number of columns found. If the parsed file was a CSV file the used delimiter is printed.
For each column the following information is printed:
- INDEX
The index of the column - NAME
If@import set
parameterHeaderStartRow
was specified and header information was extracted the extracted column name is printed. - TYPE
The type of data found. The size declaration (E.g. 16) represents the longest string found. - NULLABLE
If the column is nullable or not. - FROM_ROW
From which row in the source file the data type (e.g. String) was determined. This number serves as a hint to investigate source data when an unexpected type is analyzed. E.g. The column name BIRTHDAY in the source data indicates that this data should be a date. By investigating the source data at row 4 you may find the reason why the column was analyzed as String column.
@import target
This command is responsible of all preparation of the target table prior to import. This includes dropping, truncating, deleting from and creating the table.
When this is executed a check is done if the input data will actually fit the table. This is done by comparing of the analyze result with the specified target and column mapping. Depending on the parameters the check is performed at different occasions. Parameters for this command are:
Parameter | Default | Valid Values |
---|---|---|
Catalog | The target table Catalog | |
CleanData | Specifies if data should be cleared before import.
| |
ClearTableMethod | Specifies how the table data should be emptied prior to importing the ne data:
| |
ColumnMapping | Specifies mapping of source columns to target columns. The default is to import the source columns to the target table column by index. First column in source is imported in the first column in the table. To specify another order or to ignore certain columns use the ColumnMapping parameter.Syntax:ColumnMapping="<source column>=<target column>, <source column>=<target column>, ... .Source column can be identified by index starting with 0 or by its column name. Target column can be identified by index starting with 1 or by its table column name. Example: ColumnMapping="0=2, 1=3" or ColumnMapping="id=no, color=col" .The mapping also supports overriding of the data type information of the input data column. I.e. overriding the type information deducted by DbVisualizer when parsing/analysing the data. The syntax in this case, for a single column mapping is: <sourde column>(<data type>)=<target column> Where <data type> is one of: String, Date, Time, Timestamp, Number, Decimal Number, Boolean, BLOB and, CLOB** Example:ColumnMapping="id(Number)=no, color=col" | |
ColumnMappingFile | A reference to a file containing the column mappings. Same syntax as for the ColumnMapping parameter with the addition that column entries may be specified in separate rows. Single line comments (using -- or // ) and block comments (using /*...*/ ) are also supported. | |
CreateTableSQL | The SQL needed to create the table to import to. This parameter or the CreateTableSQLFile parameter is required if CleanData="Drop" is specified. | |
CreateTableSQLFile | A file reference to a file containing the SQL to create the table. | |
DropTableSQL | The SQL used to removing (drop) the table. | |
FailOnDropFailure | false | If true, the import will fail if the DROP table statement fails. |
Schema | The target table schema to import to. | |
SkipValidateColumns | A comma separated list of target column names for which validation shall not be done. Example:SkipValidateColumns="Column1,Column2" | |
SkipValidateColumnNumbers | A comma separated list of target column numbers for which validation shall not be done. Example:SkipValidateColumnNumbers="1,5" | |
SkipValidateJdbcTypes | A comma separated list of JDBC type names for which validation shall not be done. Example:SkipValidateJdbcTypes="INTEGER,TINYINT,VARCHAR" | |
Table | The target table to import to. | |
UseDelimitedIdentifiers | false |
|
Some examples
Example 1
@import target Table="MyTable" ColumnMapping="0=ID,2=NAME" CleanData="Drop"
CreateTableSQL="CREATE TABLE MyTable (id SMALLINT, name VARCHAR(45))";
The target table is the MyTable table. The table is dropped and recreated before import. The first column of the input data is mapped to the target column ID and the third column to the target column NAME".
As the table is dropped, we need to supply the DDL/SQL to create the table.
Example 2
@import target Table="MyTable" ColumnMapping="2=NAME" CleanData="Clear"
@import execute
This command will run the import.
Parameter | Default | Valid Values |
---|---|---|
BatchImport | true |
|
BatchSize | 100 | For every 100 (or specified) number of rows being inserted, DbVisualizer will run a database commit. |
FailOnInsertFailure | false |
|
Example: |
Run the import in a non batch mode
@import execute BatchImport="false";
Examples
Selecting data to import and mapping columns
CSV File delimited by exclamation mark "!".
Volvo!XC90
BMW!F32 4 Series
Volvo!XC60
Mercedes!C197 SLS AMG
CREATE TABLE carmodel (model VARCHAR(50), brand VARCHAR(50));
@import on;
@import set ImportSource="cars.csv" CsvColumnDelimiter="!"
SkipRowsStartingWith="BMW";
@import parse;
@import target Table="carmodel" ColumnMapping="0=brand,1=model";
@import execute;
Parameters used
CsvColumnDelimiter="!"
specifies that the data is delimited by an exclamation mark ("!").SkipRowsStartingWith="BMW"
do not import entires starting with "BMW".ColumnMapping="0=brand,1=model
column 0 of the CSV file is mapped to the brand column of the database table. Column 1 is mapped to the model column.
The table carmodel content after import:
model brand
------------ --------
XC90 Volvo
XC60 Volvo
C197 SLS AMG Mercedes
Overriding analyzed type information
When an input file is parsed, DbVisualizer analyzes the data to determine the data types of the input data. The algorithm for this is quite coarse, and it is not possible to override the analyzed data type.
CSV data
NAME,BIRTHDAY
August, "Sat, 21 Jul 1962"
Sven, "Fri, 21 Jan 1972"
Lotta, "NoData"
Bert, "Sat, 21 Jul 1962"
Note that the birthday of Lotta is NoData which is of course not a valid date. When the data is analyzed, it will come to the conclusion that the BIRTHDAY column is a String.
The result of @import parse
will contain a table describing information about the data that was parsed.
Parsed 5 records. Columns in source: 2 using ',' delimiter
INDEX NAME TYPE NULLABLE FROM ROW
----- -------- ---------- -------- --------
0 NAME String(6) No 2
1 BIRTHDAY String(16) No 4
Total bytes: 73 B
As mentioned earlier you can see that column BIRTHDAY has been interpreted as a String. This was found examining row 4 (FROM ROW column is 4).
In connection with inserting this column in the database DbVisualizer would insert/set this as a string. This would result in total import failure and no rows would be inserted in the database.
This may be addressed by overriding the analysed type for BIRTHDAY (String) and set the type to date.
The script:
CREATE TABLE birthdays (name VARCHAR(40), birthday DATE);
@import on;
@import set ImportSource="birthdays.csv" CsvTextQuotedBetween="Double" DateFormat="EEE, d MMM yyyy" HeaderStartRow="1";
@import parse;
@import target Table="birthdays" ColumnMapping="0=name,1(date)=birthday";
@import execute;
@import off;
DateFormat="EEE, d MMM yyyy"
Defining the format to be able to interpret the dates in the CSV file.ColumnMapping="0=name,1(date)=birthday"
Note the 1(date)=birthday where we are mapping the column with index 1 to the target column birthday. The (date) part specifies that column 1 should be interpreted as a date.
The result of the import using the script above is that 3 rows are imported (August, Sven and Bert). The row representing Lotta is reported as a failure as Indicated below.
1 Record affected, Record: 0 originating at row: 4
DataRecordException: Convert error, Column: BIRTHDAY at index: 1
DataTypeConversionException: Value is 'NoData'. Not a valid date format. Valid format: 'EEE, d MMM yyyy'
Importing fixed column width input data (TxtColumns parameter)
Text File
001 APPLE
002 SLEMON
003 ORANGE
The SQL Script
@import on;
@import set ImportSource="fruitlist.txt" TxtColumns="0,6";
@import parse;
@import target Table="fruitslist" Catalog="test";
@import execute;
The parameter TxtColumns parameter specifies the column character positions. In this case first column starts at character position 0 and the second column starts at character position 6.
The resulting imported table is
id name
-- ------
1 APPLE
2 LEMON
3 ORANGE
Note how "LEMON" is imported without proceeding blanks. This is because column values are trimmed (TxtTrim parameter default is true).
The TxtColumns parameter
The TxtColumns parameter supports a number of syntaxes as explained in the examples below.
An example when parsing a row "AAA BBB CCC"
TxtColums parameter | Yields extracted columns |
---|---|
0-2, 4-5 | "AAA" "BB" |
1-3, 8-9 | "AA" "C" |
Omitting the end index (as in the SQL script above)
TxtColums parameter | Yields extracted columns |
---|---|
0, 4, 8 ( same as 0-3,4-7, 8-end of line) | "AAA" "BBB" "CCC" |
1, 8-9 (same as 1-7,8-9) | "AA BBB" "C" |
Using the "+" sign
TxtColums parameter | Yields extracted columns |
---|---|
0+3, 4+3, 8 (Same as 0-3,4-7, 8-end of line) | "AAA" "BBB" "CCC" |
0+7, 8+1 (Same as 0-7, 8-9) | "AAA BBB" "CC" |
Importing Excel data
Excel file
A B
-- --------
1 A
2 #DIV/0!
3 C
Note how row 2 column B has the value #DIV/0!. This value represents a case where a cell is a calculated using formula where the calculation is producing an error.
SQL Script to import
CREATE TABLE exceldata (id INT, value VARCHAR(50));
@import on;
@import set ImportSource="excelData.xlsx" ExcelSheetName="mydata" ExcelCellPolicyError="SKIP_ROW";
@import parse;
@import target Table="exceldata";
@import execute;
@import off;
Parameters used
ExcelSheetName="mydata"
Specifying that the sheet named mydata is is the sheet to import.ExcelCellPolicyError="SKIP\_ROW"
Specifies that if we find a formula that produced an error we should skip the complete row.
Resulting Table
id value
-- -----
1 A
3 C
Note that row 2 is not imported as we instructed by ExcelCellPolicyError="SKIP\_ROW"
.
Continuing an export that has failed
If any of the input data cannot be imported DbVisualizer will keep track of this. This is done by storing the failed data in a specific errorRecords.drec file in the directory where the import process stores its intermediate results (See ImportResultRoot parameter).
Following is an example were the export fails. It also shows how to import the failed data again. Specifically, the first import fails as Clementine is a name that is too long to fit in the target table column.
ID,FRUIT,PRICE
1,Banana,1.22
2,Clementine,0.35
3,Orange,0.55
First Import SQL Script
CREATE TABLE fruits (id SMALLINT, name VARCHAR(6), price DECIMAL(10,2));
@import on Clean="true" ImportResultDir="/tmp/importContinueFirstImport";
@import set ImportSource="fruits.csv" HeaderStartRow="1";
@import parse;
@import target Table="fruits" SkipValidateJdbcTypes="VARCHAR";
@import execute;
@import off;
Note: the table definition for the fruit table defines the column name to be VARCHAR(6). The input data "Clementine" will not fit there.
Parameters used
ImportResultDir="/tmp/importContinueFirstImport"
We get our results in this directory. Makes it easy to refer in the second import script.SkipValidateJdbcTypes="VARCHAR"
Tells DbVisualizer not to validate VARCHAR columns. This is done for the purpose of this example. If not specified, the import would stop before any data has been imported.
When running this import the Database used in the example (MySQL) will fail when the import tries to insert the data row 3 as Clementine will not fit the column. The Failure printed in the DbVisualizer Log for this looks something like:
1 Record affected, Record: id = 0 originating at row: 3
DataRecordException: Error when importing data.
MysqlDataTruncation: Data truncation: Data too long for column 'name' at row 1
Note that the source data is pinpointed as originating at row: 3.
The Table fruits content after import.
id name price
-- ------ -----
1 Banana 1.22
3 Orange 0.55
Second import SQL Script
ALTER TABLE fruits MODIFY COLUMN name VARCHAR(20);
@import on ImportResultDir="/tmp/importContinueSecond" UseImportFile="/tmp/importContinueFirstImport/errorRecords.drec" Clean="true";
@import execute;
@import off;
The ALTER statement is dealing with the root cause of why the import failed. The name column was too small.
Note that when continuing an import, the commands @import set and @import target is not specified. The settings and target from the old import is used.
Parameters used
ImportResultDir="/tmp/importContinueSecond"
Specifying a separate directory for the resultsUseImportFile="/tmp/importContinueFirstImport/Results/errorRecords.drec"
Pinpointing the file containing the data that contains the data that could not be imported.
The Table fruits content after second import.
id name price
-- ---------- -----
1 Banana 1.22
3 Orange 0.55
2 Clementine 0.35
Testing an import - Dry Run
The client-side import offers a way to run the import script to perform all client side data validation without changing anything in the database. This is done using the @set dryrun
command.
Note that when running the import, without dry run, the import may fail nevertheless due to checks on the database side. E.g. primary- or unique key constraint checks.
@import set ImportSource="fruits.csv";
@import parse;
@set dryrun;
@import target Table="fruits" CleanData="Clear";
@import execute;
@set dryrun off;
@import off;
CleanData
indicates. Nor does the @import execute
lead to any rows being inserted in the database.Since there is a @set dryrun
command prior to the commands no changes to the database table will be performed.