View Source csv (util v1.3.4)
CSV file parsing functions
Link to this section Summary
Functions
Guess the type of data by its value
Guess data types of fields in the given CSV list of rows obtained by parsing a CSV file with parse(File,[fix_lengths])
. The function returns a list of tuples {Type, MaxFieldLen, NumOfNulls}
, where the Type
is a field type, MaxFieldLen
is the max length of data in this column, and NumOfNulls
is the number of rows with empty values in this column.
Load CSV data from a File
to a MySQL
database. Tab
is the name of a table where to load data. MySqlPid
is the pid of a MySQL database connection returned by mysql:start_link/1
. The data in the table is replaced according to {import_type, Type}
Get max field lengths for a list obtained by parsing a CSV file with parse_csv_file(File,[fix_lengths])
.
Parse a CSV file using default options.
Parse a given CSV file.
Parse a CSV line
Link to this section Types
-type load_options() ::
[{load_type, recreate | replace | ignore_dups | update_dups} |
{col_types, #{binary() => ColType :: atom() | {ColType :: atom(), ColLen :: integer()}}} |
{batch_size, integer()} |
{blob_size, integer()} |
{create_table, boolean()} |
{save_create_sql_to_file, string()} |
{transforms, #{binary() => fun((term()) -> term())}} |
{guess_types, boolean()} |
{guess_limit_rows, integer()} |
{max_nulls_pcnt, float()} |
{primary_key, PKColumns :: binary() | [binary() | list()]} |
{indexes,
[{IdxName :: string() | binary(), [Col :: string() | binary()]} |
{IdxName :: string() | binary(), [Col :: string() | binary()], unique | undefined}]} |
{drop_temp_table, boolean()} |
{encoding, string() | atom()} |
{verbose, boolean() | integer()}].
Options for loading data to a database.
- {load_type, Type}
- Type of loading to perform.
recreate
will replace the table by atomically dropping the old one, creating/loading the new one, and replacing the table.replace
will do an insert by usingREPLACE INTO
statement.ignore_dups
will useINSERT IGNORE INTO
statement to ignore records with duplicate keys.update_dups
will do anINSERT INTO
andON DUPLICATE KEY UPDATE
, so that the old records are updated and the new ones are inserted. - {create_table, Allow}
- Allow to create a table if it doesn't exist
- {col_types, Map}
- Types of data for all or some columens. The Map is in the format:
ColName::binary() => ColInfo
, where ColInfo isColType | {ColType, ColLen::integer()}
, andColType
is:date | datetime | integer | float | blob | number
. - {transforms, Map}
- Value transformation function for columns. The Map is in the format:
ColName::binary() => fun((Value::term()) -> term())
. - {batch_size, Size}
- Number of records per SQL insert/update/replace call
- {blob_size, Size}
- Threshold in number of bytes at which a VARCHAR field is defined as BLOB
- {save_create_sql_to_file, Filename::string()}
- Save CREATE TABLE sql statement to a file
- guess_types
- When specified, the function will try to guess the type of data in columns instead of treating all data as string fields. The possible data typed guessed: integer, float, date, datetime, number, string
- {guess_limit_rows, Limit}
- Limit the number of rows for guessing the column data types
- {max_nulls_pcnt, Percent}
- A percentage threshold of permissible NULLs in a column (0-100), above which the column data type is forced to be treated as
string
- {primary_key, Fields}
- Names of primary key fields in the created table
- {indexes, Indexes}
- Indexes to create
- {drop_temp_table, boolean()}
- When true (default), temp table is dropped.
- {encoding, Encoding}
- The name of the encoding to use for storing data. For the list of permissible values [see this link](https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html)
- verbose
- Print additional details to stdout
-type parse_options() ::
[fix_lengths | binary | list |
{open, list()} |
{columns, [binary() | string()]} |
{converters,
[{binary() | string() | all,
fun((binary(), binary()) -> binary()) | {rex, binary(), binary()}}]}].
CSV Parsing Options.
- fix_lengths
- if a record has a column count greater than what's found in the header row, those extra columns will be dropped, and if a row has fewer columns, empty columns will be added.
- binary
- Return fields as binaries (default)
- list
- Return fields as lists
- {open, list()}
- Options given to file:open/2
- {columns, Names}
- Return data only in given columns
- {converters, [{Col, fun((ColName, Value) -> NewValue)| {rex, RegEx, Replace}]}
- Data format converter. If
Col
isall
, the same converting function is used for all columns. If the converter is a{rex, RegEx, Replace}
then the regular expression replacement will be run on a value in the requested column.
Link to this section Functions
-spec guess_data_type(binary()) -> {null | date | datetime | integer | float | string, term(), string()}.
Guess the type of data by its value
-spec guess_data_types(HasHeaderRow :: boolean(), Rows :: [Fields :: [binary()]]) ->
{Type :: string | integer | number | float | date | datetime,
MaxFieldLen :: integer(),
NumOfNulls :: integer()}.
Guess data types of fields in the given CSV list of rows obtained by parsing a CSV file with parse(File,[fix_lengths])
. The function returns a list of tuples {Type, MaxFieldLen, NumOfNulls}
, where the Type
is a field type, MaxFieldLen
is the max length of data in this column, and NumOfNulls
is the number of rows with empty values in this column.
-spec load_to_mysql(File :: string(), Tab :: string(), MySqlPid :: pid(), Opts :: load_options()) -> {Columns :: list(), AffectedCount :: integer(), RecCount :: integer()}.
Load CSV data from a File
to a MySQL
database. Tab
is the name of a table where to load data. MySqlPid
is the pid of a MySQL database connection returned by mysql:start_link/1
. The data in the table is replaced according to {import_type, Type}
:
recreate
- The table is entirely replaced with the data from file. The data from the file is loaded atomically - i.e. either the whole file loading succeeds or fails. This is accomplished by first loading data to a temporary table, and then using the database's ACID properties to replace the target table with the temporary table.replace
- Use "REPLACE INTO" instead of "INSERT INTO" existing tableignore_dups
- The insert in the existing table is performed and the records with duplicate keys are ignoredupdate_dups
- The insert in the existing table is performed and the records with duplicate keys are updatedupsert
- The insert/update in the existing table is performed without creating a temporary table
NOTE: this function requires https://github.com/mysql-otp/mysql-otp.git
-spec max_field_lengths(HasHeaderRow :: boolean(), Rows :: [Fields :: list()]) -> [Len :: integer()].
Get max field lengths for a list obtained by parsing a CSV file with parse_csv_file(File,[fix_lengths])
.
-spec parse(string()) -> [[binary()]].
Parse a CSV file using default options.
-spec parse(binary() | string(), parse_options()) -> [[string()]].
Parse a given CSV file.
-spec parse_line(binary()) -> list().
Parse a CSV line