Developing a heart attack prediction system involves leveraging machine learning algorithms to analyze patient data and assess the risk of a heart attack. Below is a step-by-step guide to creating such a system using Python, along with code snippets and references to existing projects.
1. Data Collection:
Utilize a dataset containing relevant clinical parameters. The Heart Failure Prediction Dataset on Kaggle is a suitable choice, comprising 11 features that can be used to predict potential heart disease.
2. Data Preprocessing:
Load the dataset and preprocess it by handling missing values, encoding categorical variables, and scaling numerical features.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Load dataset
data = pd.read_csv('heart_failure_prediction.csv')
# Handle missing values if any
data = data.dropna()
# Encode categorical variables if necessary
data = pd.get_dummies(data, drop_first=True)
# Split data into features and target
X = data.drop('target', axis=1)
y = data['target']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Scale numerical features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
3. Model Selection:
Choose a classification algorithm suitable for the dataset. Logistic Regression is a common choice for binary classification problems.
from sklearn.linear_model import LogisticRegression
# Initialize the model
model = LogisticRegression()
# Train the model
model.fit(X_train, y_train)
4. Model Evaluation:
Assess the model's performance using metrics such as accuracy, precision, recall, and the area under the ROC curve (AUC).
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score
# Predict on test data
y_pred = model.predict(X_test)
# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print(f'Precision: {precision}')
print(f'Recall: {recall}')
print(f'ROC AUC: {roc_auc}')
5. Deployment:
Deploy the model using a web framework like Streamlit to create an interactive user interface.
import streamlit as st
import numpy as np
# Function to make predictions
def predict_heart_attack(features):
features = np.array(features).reshape(1, -1)
features = scaler.transform(features)
prediction = model.predict(features)
return prediction
# Streamlit interface
st.title('Heart Attack Prediction')
age = st.number_input('Age')
sex = st.selectbox('Sex', [0, 1])
cp = st.selectbox('Chest Pain Type', [0, 1, 2, 3])
# Add other input fields as necessary
if st.button('Predict'):
features = [age, sex, cp] # Add other features in the correct order
prediction = predict_heart_attack(features)
if prediction[0] == 1:
st.write('High risk of heart attack.')
else:
st.write('Low risk of heart attack.')
Existing Projects and Resources:
-
Heart Attack Analysis & Prediction: A project that predicts the likelihood of a heart attack based on clinical parameters, deployed using Streamlit.
-
Heart Attack Prediction Using Logistic Regression: An article detailing the use of Logistic Regression for heart disease prediction.
-
Heart Attack Prediction With Machine Learning: A Medium article exploring a heart attack risk prediction model built using Python.
For a visual walkthrough of building a heart attack prediction system, you may refer to the following tutorial: