The final work proposal for completing the data scientist training is to work with a topic that I identify with a lot and create a predictive model with Python, and make the forecast for the next periods and visualize it through Business Intelligence tools.
Objective
Develop predictive models by analyzing several model hypotheses and choosing the one with the best performance for use within BI.
Project execution
For this project, a search was initially carried out in french email address list database . The intention of this project from the beginning was to create a predictive sales model. Once a database from a retail company was found, it was possible to start the modeling process.
Saving the data on the GitHub platform, it was identified that the database had the following parameters: date of sale, price, daily stock volume and sales volume.
After this initial analysis, the ETL process began. The database was loaded using the pandas library. The following adjustments were made for modeling: converting the 'date' field, which was an object, to a date, and removing the fields that were strings for model training, and adding variables taken from the date, such as week, day of the week, month, and year.
After all the treatment, an exploratory analysis of the data was carried out to identify possible points that could bring insights to the model.
With the scikit-learn library, the data was separated between training and testing to execute the predictive models and test them to evaluate their performance.
Still using the aforementioned library, we performed five analyses with the data, through linear regression, non-linear regression (polynomial of degree 2), decision tree regression, Random forest regression and MPL neural networks. After evaluating the 5 models, we saw that none of them obtained a satisfactory score to solve the problem.
Developing predictive models with Python
-
- Posts: 25
- Joined: Sun Dec 22, 2024 3:26 am