2 Read Files

The first step was to read the files and organize them. In each file, some columns had to be translated and some data reordered to facilitate the exploratory analysis work.

data

data

2.1 Accounts

Here we find the account data. There are 4500 records, each containing information about the account district, frequency of statement issuance, and date of account creation.

To make the data easier to understand, we translate Czech frequency values into English and separate dates with hyphens.

Table 2.1: accounts frame
account_id district_id frequency date
576 55 monthly 1993-01-01
3818 74 monthly 1993-01-01
704 55 monthly 1993-01-01
2378 16 monthly 1993-01-01
2632 24 monthly 1993-01-02

2.2 Clients

The Clients list contains 5369 bank customers, tabulated with the district code and their birthday code - which informs the birthday and gender. For the analysis, we separated the date of birth and gender in separate columns.

Table 2.2: clients frame
client_id birth_number district_id gender_code gender birth_date
1 706213 18 1 W 1970-12-13
2 450204 1 0 M 1945-02-04
3 406009 1 1 W 1940-10-09
4 561201 5 0 M 1956-12-01
5 605703 5 1 W 1960-07-03

2.3 Disposition

The Disposition relationship contains interactions between customers and accounts, classifying as owner / dependent. Some explanations of possible account-owner-dependency interactions: Every account has an owner and may or may not have dependents. Dependents may own another account. A customer may own more than one account.

Table 2.3: disposition frame
disp_id client_id account_id type
1 1 1 OWNER
2 2 2 OWNER
3 3 2 DISPONENT
4 4 3 OWNER
5 5 3 DISPONENT

2.4 Order

List of payment orders issued to accounts. In addition to the money order code and the account code, the records contain information about the bank and account you received, the amount and type of payment (household, insurance, leasing and loan). To make it easier to understand, it was necessary to translate Czech payment types into English.

Table 2.4: order frame
order_id account_id bank_to account_to amount k_symbol
29401 1 YZ 87144583 2452.0 household
29402 2 ST 89597016 3372.7 loan
29403 2 QR 13943797 7266.0 household
29404 3 WX 83084338 1135.0 household
29405 3 CD 24485939 327.0

2.5 Transactions

List of all transactions made by customers, informing: Account that performed the transaction Transaction Date Transaction Type (Inbound and Outbound) Transaction (credit card withdrawal, credit in cash, collection from another bank, withdrawal in cash, remittance to another bank) Amount (transaction amount) Balance after transaction K-symbol (characterization of transaction) Bank (receiving bank) Account (receiving account)

Table 2.5: transaction frame
trans_id account_id date type operation amount balance k_symbol bank account
695247 2378 1993-01-01 credit credit in cash 700 700 NA
171812 576 1993-01-01 credit credit in cash 900 900 NA
207264 704 1993-01-01 credit credit in cash 1000 1000 NA
1117247 3818 1993-01-01 credit credit in cash 600 600 NA
579373 1972 1993-01-02 credit credit in cash 400 400 NA

2.6 Loans

Loan list, with 682 occurrences. Each customer can only receive one loan. In the table we have Loan key Account Key Date Value Duration (in months) Installment Payment Amount Status (finished - ok; Finished - not ok; Running - ok; Running - in debt) To make understanding easier during the analysis, we classify the status according to its conditions rather than leaving the original groups A through D from the database.

Table 2.6: loan frame
loan_id account_id date amount duration payments status
5314 1787 1993-07-05 96396 12 8033 Finished - not payed
5316 1801 1993-07-11 165960 36 4610 Finished - OK
6863 9188 1993-07-28 127080 60 2118 Finished - OK
5325 1843 1993-08-03 105804 36 2939 Finished - OK
7240 11013 1993-09-06 274740 60 4579 Finished - OK

2.7 Credit Card

The credit card list contains 892 occurrences and stores the card code, disposition id, card type (junior / classic / gold) and date of issue. Treatment was performed so that the issue date was in the Brazilian model.

Table 2.7: card frame
card_id disp_id type issued
1005 9285 classic 1993-11-07
104 588 classic 1994-01-19
747 4915 classic 1994-02-05
70 439 classic 1994-02-08
577 3687 classic 1994-02-15

2.8 Demographic data

Finally, demographic data records 77 municipalities distributed in the 7 regions of the Czech Republic. The table contains the following elements:

District code;

District name;

Region;

Nº of inhabitants

Nº of municipalities with <499 inhabitants

Nº of municipalities with 500-1999 inhabitants

Nº of municipalities with 2000-9999 inhabitants

Nº of municipalities with> 1000 inhabitants

Nº of cities

Proportion of urban inhabitants

Average Salary

unemployment rate ‘95

Unemployment Rate ‘96

Nº of entrepreneurs per 1000 inhabitants

Nº of crimes committed in ‘95

Nº of crimes committed in ‘96

Table 2.8: demographic frame
region district_name inhabitants
Prague Hl.m. Praha 1204953
central Bohemia Benesov 88884
central Bohemia Beroun 75232
central Bohemia Kladno 149893
central Bohemia Kolin 95616