Data Science

This is the text describing Data Science

Contact us now!

If you have a ready-made project plan, or an idea for it, leave your contact information. We will definitely contact you and help you implement your idea!





    Our works

    E-commerce store data warehouse

    Technologies: Python, MySQL, JSON, Excel, Power BI, Jupyter Notebook. A system for gathering and analysis of information along with a system for supplies and sales forecast has been developed within this project. Sales, stock balance and events data is being gathered via online marketplaces Wildberries, OZON, Yandex.Market and SberMegaMarket. Data that is unavailable for online marketplaces’ API is being uploaded to the system manually by using XLS-files. A process for data cleaning has been created as well. Clients’ data analysts have been provided with a special interface for modelling and further testing using Business Analytics frameworks such as Power BI. The prediction model in this project uses ML algorithms.

    Reviewing legal documents using NLP techniques

    Technologies: Python, NLP, Machine Learning Duration: 3 months Our client builds a financial software (named ‘CF Engine’) in order to model complex financial products (RMBS, ABS, CLO, etc.). The main goal of the project is to extend this software using feature that allows users to review the related legal documents based on the information from the model. The developed model needs to check if specific doc corresponds to one of the created models. For example, if processed document is mortgage than model: - parses mortgage document (from PDF, Word, plain text format); - checks if document contains all required information (all parties are specified and described correctly, property is described, interest rate is specified, all information required by law is provided and so on); - if document fits the model then system extracts important information (parties, property description, interest rates and so on) and provides it as summary for user review. System supports different formats of input documents and different types of documents, such as mortgages, car loans, commercial loans and so on. Also system supports different countries of operating, i.e. different structure of document for each country and different languages.

    Sports prediction software

    Technologies: Python, GLPK Duration: 2 months Business goal of that project was to build a tool that could create the best possible lineup of fantasy sport players given their projected fantasy points for the next game. The lineups were built for several sports like NBA, NFL and MLB and several daily fantasy sports (DFS) websites like Fanduel and DraftKings. The tool can build a list of optimal lineups based on the same set of players and their projected fantasy points. Among other options it is possible to build the best lineups with one or more players already present (locked) in the lineup. It is also possible to constrain each player's exposure in the list of optimal lineups, min and max number of players from each team in the lineup and more. Among technical issues we solved was finding a linear programming model that most correctly and efficiently describes the lineup and all corresponding constraints. The project implemented mixed integer linear programming model which was solved by a GNU Linear Programming Kit (GLPK) solver through a Python interface library Pyomo. The tool was implemented as a web service. Model parameters were populated dynamically with JSON data through a REST-like API interface by the end user and the best lineups were also sent to the user as JSON data. The linear programming model actually consisted of six models, one for each sport type (NBA, NFL, MLB) and DFS website (Fanduel, DraftKings).

    Looking for correlations and making prediction in e-commerce system

    Technologies: Python, Scikit-learn Duration: 5 months Implemented system that ranks customers making projections critical for business actions like upgrading from free to premium accounts, amounts of payments for premium accounts and how likely that person will stop using the service. The system is trained on a database of user account information and user series of actions made at the service. After the new user data is provided to system, a ranking based on projected future actions is produced. We used the following techniques: - feature engineering for machine learning; - random forest/decision trees models; - "bag of words" like model for user actions.

    Tank Level Prediction Algorithm

    Technologies: Python, Scikit, SciPy Duration: 6 months The goal of the project was to predict the level of fluid in a tank based on the data returned from the sensor connected to the tank after the strike is generated by a specific device. The data provided by the client contains information returned by sensors at different various tank levels. After applying the principle component analysis (PCA) to extract patterns from signals at the spectral domain, a simple logistic regression model was created. The model demonstrated good attributes/results and was used for prediction levels with a 98%-100% accuracy. Short: developed a predictive analytics solution for oil industry. The solution is aimed to predict the level of oil based on the sensor data. #Data Science