csv (util v1.3.5)
View SourceCSV file parsing functions
Author: Serge Aleynikov saleyn@gmail.com Copyright: 2021 Serge Aleynikov
Summary
Functions
Guess the type of data by its value
Guess data types of fields in the given CSV list of rows obtained by parsing a
CSV file with parse(File,[fix_lengths]). The function returns a list of tuples
{Type, MaxFieldLen, NumOfNulls}, where the Type is a field type,
MaxFieldLen is the max length of data in this column, and NumOfNulls is the
number of rows with empty values in this column.
Load CSV data from a File to a MySQL database. Tab is the name of a table
where to load data. MySqlPid is the pid of a MySQL database connection
returned by mysql:start_link/1. The data in the table is replaced according to
{import_type, Type}
Get max field lengths for a list obtained by parsing a CSV file with
parse_csv_file(File,[fix_lengths]).
Parse a CSV file using default options.
Parse a given CSV file.
Parse a CSV line
Types
-type load_options() :: [{load_type, recreate | replace | ignore_dups | update_dups} | {col_types, #{binary() => ColType :: atom() | {ColType :: atom(), ColLen :: integer()}}} | {batch_size, integer()} | {blob_size, integer()} | {create_table, boolean()} | {save_create_sql_to_file, string()} | {transforms, #{binary() => fun((term()) -> term())}} | {guess_types, boolean()} | {guess_limit_rows, integer()} | {max_nulls_pcnt, float()} | {primary_key, PKColumns :: binary() | [binary() | list()]} | {indexes, [{IdxName :: string() | binary(), [Col :: string() | binary()]} | {IdxName :: string() | binary(), [Col :: string() | binary()], unique | undefined}]} | {drop_temp_table, boolean()} | {encoding, string() | atom()} | {verbose, boolean() | integer()}].
Options for loading data to a database.
{load_type, Type}— Type of loading to perform.recreatewill replace the table by atomically dropping the old one, creating/loading the new one, and replacing the table.replacewill do an insert by usingREPLACE INTOstatement.ignore_dupswill useINSERT IGNORE INTOstatement to ignore records with duplicate keys.update_dupswill do anINSERT INTOandON DUPLICATE KEY UPDATE, so that the old records are updated and the new ones are inserted.{create_table, Allow}— Allow to create a table if it doesn't exist{col_types, Map}— Types of data for all or some columens. The Map is in the format:ColName::binary() => ColInfo, where ColInfo isColType | {ColType, ColLen::integer()}, andColTypeis:date | datetime | integer | float | blob | number.{transforms, Map}— Value transformation function for columns. The Map is in the format:ColName::binary() => fun((Value::term()) -> term()).{batch_size, Size}— Number of records per SQL insert/update/replace call{blob_size, Size}— Threshold in number of bytes at which a VARCHAR field is defined as BLOB{save_create_sql_to_file, Filename::string()}— Save CREATE TABLE sql statement to a fileguess_types— When specified, the function will try to guess the type of data in columns instead of treating all data as string fields. The possible data typed guessed: integer, float, date, datetime, number, string{guess_limit_rows, Limit}— Limit the number of rows for guessing the column data types{max_nulls_pcnt, Percent}— A percentage threshold of permissible NULLs in a column (0-100), above which the column data type is forced to be treated asstring{primary_key, Fields}— Names of primary key fields in the created table{indexes, Indexes}— Indexes to create{drop_temp_table, boolean()}— When true (default), temp table is dropped.{encoding, Encoding}— The name of the encoding to use for storing data. For the list of permissible values see this linkverbose— Print additional details to stdout
-type parse_options() :: [fix_lengths | binary | list | {open, list()} | {columns, [binary() | string()]} | {converters, [{binary() | string() | all, fun((binary(), binary()) -> binary()) | {rex, binary(), binary()}}]}].
CSV Parsing Options.
fix_lengths— if a record has a column count greater than what's found in the header row, those extra columns will be dropped, and if a row has fewer columns, empty columns will be added.binary— Return fields as binaries (default)list— Return fields as lists{open, list()}— Options given to file:open/2{columns, Names}— Return data only in given columns{converters, [{Col, fun((ColName, Value) -> NewValue)| {rex, RegEx, Replace}]}— Data format converter. IfColisall, the same converting function is used for all columns. If the converter is a{rex, RegEx, Replace}then the regular expression replacement will be run on a value in the requested column.
Functions
-spec guess_data_type(binary()) -> {null | date | datetime | integer | float | string, term(), string()}.
Guess the type of data by its value
-spec guess_data_types(HasHeaderRow :: boolean(), Rows :: [Fields :: [binary()]]) -> {Type :: string | integer | number | float | date | datetime, MaxFieldLen :: integer(), NumOfNulls :: integer()}.
Guess data types of fields in the given CSV list of rows obtained by parsing a
CSV file with parse(File,[fix_lengths]). The function returns a list of tuples
{Type, MaxFieldLen, NumOfNulls}, where the Type is a field type,
MaxFieldLen is the max length of data in this column, and NumOfNulls is the
number of rows with empty values in this column.
-spec load_to_mysql(File :: string(), Tab :: string(), MySqlPid :: pid(), Opts :: load_options()) -> {Columns :: list(), AffectedCount :: integer(), RecCount :: integer()}.
Load CSV data from a File to a MySQL database. Tab is the name of a table
where to load data. MySqlPid is the pid of a MySQL database connection
returned by mysql:start_link/1. The data in the table is replaced according to
{import_type, Type}:
recreate- The table is entirely replaced with the data from file.replace- Use "REPLACE INTO" instead of "INSERT INTO" existing tableignore_dups- The insert in the existing table is performed and theupdate_dups- The insert in the existing table is performed and theupsert- The insert/update in the existing table is performed
NOTE: this function requires [https://github.com/mysql-otp/mysql-otp.git]
-spec max_field_lengths(HasHeaderRow :: boolean(), Rows :: [Fields :: list()]) -> [Len :: integer()].
Get max field lengths for a list obtained by parsing a CSV file with
parse_csv_file(File,[fix_lengths]).
Parse a CSV file using default options.
-spec parse(binary() | string(), parse_options()) -> [[string()]].
Parse a given CSV file.
Parse a CSV line