2 Read Files
The first step was to read the files and organize them. In each file, some columns had to be translated and some data reordered to facilitate the exploratory analysis work.

data
2.1 Accounts
Here we find the account data. There are 4500 records, each containing information about the account district, frequency of statement issuance, and date of account creation.
To make the data easier to understand, we translate Czech frequency values into English and separate dates with hyphens.
account_id | district_id | frequency | date |
---|---|---|---|
576 | 55 | monthly | 1993-01-01 |
3818 | 74 | monthly | 1993-01-01 |
704 | 55 | monthly | 1993-01-01 |
2378 | 16 | monthly | 1993-01-01 |
2632 | 24 | monthly | 1993-01-02 |
2.2 Clients
The Clients list contains 5369 bank customers, tabulated with the district code and their birthday code - which informs the birthday and gender. For the analysis, we separated the date of birth and gender in separate columns.
client_id | birth_number | district_id | gender_code | gender | birth_date |
---|---|---|---|---|---|
1 | 706213 | 18 | 1 | W | 1970-12-13 |
2 | 450204 | 1 | 0 | M | 1945-02-04 |
3 | 406009 | 1 | 1 | W | 1940-10-09 |
4 | 561201 | 5 | 0 | M | 1956-12-01 |
5 | 605703 | 5 | 1 | W | 1960-07-03 |
2.3 Disposition
The Disposition relationship contains interactions between customers and accounts, classifying as owner / dependent. Some explanations of possible account-owner-dependency interactions: Every account has an owner and may or may not have dependents. Dependents may own another account. A customer may own more than one account.
disp_id | client_id | account_id | type |
---|---|---|---|
1 | 1 | 1 | OWNER |
2 | 2 | 2 | OWNER |
3 | 3 | 2 | DISPONENT |
4 | 4 | 3 | OWNER |
5 | 5 | 3 | DISPONENT |
2.4 Order
List of payment orders issued to accounts. In addition to the money order code and the account code, the records contain information about the bank and account you received, the amount and type of payment (household, insurance, leasing and loan). To make it easier to understand, it was necessary to translate Czech payment types into English.
order_id | account_id | bank_to | account_to | amount | k_symbol |
---|---|---|---|---|---|
29401 | 1 | YZ | 87144583 | 2452.0 | household |
29402 | 2 | ST | 89597016 | 3372.7 | loan |
29403 | 2 | QR | 13943797 | 7266.0 | household |
29404 | 3 | WX | 83084338 | 1135.0 | household |
29405 | 3 | CD | 24485939 | 327.0 |
2.5 Transactions
List of all transactions made by customers, informing: Account that performed the transaction Transaction Date Transaction Type (Inbound and Outbound) Transaction (credit card withdrawal, credit in cash, collection from another bank, withdrawal in cash, remittance to another bank) Amount (transaction amount) Balance after transaction K-symbol (characterization of transaction) Bank (receiving bank) Account (receiving account)
trans_id | account_id | date | type | operation | amount | balance | k_symbol | bank | account |
---|---|---|---|---|---|---|---|---|---|
695247 | 2378 | 1993-01-01 | credit | credit in cash | 700 | 700 | NA | ||
171812 | 576 | 1993-01-01 | credit | credit in cash | 900 | 900 | NA | ||
207264 | 704 | 1993-01-01 | credit | credit in cash | 1000 | 1000 | NA | ||
1117247 | 3818 | 1993-01-01 | credit | credit in cash | 600 | 600 | NA | ||
579373 | 1972 | 1993-01-02 | credit | credit in cash | 400 | 400 | NA |
2.6 Loans
Loan list, with 682 occurrences. Each customer can only receive one loan. In the table we have Loan key Account Key Date Value Duration (in months) Installment Payment Amount Status (finished - ok; Finished - not ok; Running - ok; Running - in debt) To make understanding easier during the analysis, we classify the status according to its conditions rather than leaving the original groups A through D from the database.
loan_id | account_id | date | amount | duration | payments | status |
---|---|---|---|---|---|---|
5314 | 1787 | 1993-07-05 | 96396 | 12 | 8033 | Finished - not payed |
5316 | 1801 | 1993-07-11 | 165960 | 36 | 4610 | Finished - OK |
6863 | 9188 | 1993-07-28 | 127080 | 60 | 2118 | Finished - OK |
5325 | 1843 | 1993-08-03 | 105804 | 36 | 2939 | Finished - OK |
7240 | 11013 | 1993-09-06 | 274740 | 60 | 4579 | Finished - OK |
2.7 Credit Card
The credit card list contains 892 occurrences and stores the card code, disposition id, card type (junior / classic / gold) and date of issue. Treatment was performed so that the issue date was in the Brazilian model.
card_id | disp_id | type | issued |
---|---|---|---|
1005 | 9285 | classic | 1993-11-07 |
104 | 588 | classic | 1994-01-19 |
747 | 4915 | classic | 1994-02-05 |
70 | 439 | classic | 1994-02-08 |
577 | 3687 | classic | 1994-02-15 |
2.8 Demographic data
Finally, demographic data records 77 municipalities distributed in the 7 regions of the Czech Republic. The table contains the following elements:
District code;
District name;
Region;
Nº of inhabitants
Nº of municipalities with <499 inhabitants
Nº of municipalities with 500-1999 inhabitants
Nº of municipalities with 2000-9999 inhabitants
Nº of municipalities with> 1000 inhabitants
Nº of cities
Proportion of urban inhabitants
Average Salary
unemployment rate ‘95
Unemployment Rate ‘96
Nº of entrepreneurs per 1000 inhabitants
Nº of crimes committed in ‘95
Nº of crimes committed in ‘96
region | district_name | inhabitants |
---|---|---|
Prague | Hl.m. Praha | 1204953 |
central Bohemia | Benesov | 88884 |
central Bohemia | Beroun | 75232 |
central Bohemia | Kladno | 149893 |
central Bohemia | Kolin | 95616 |