In order to use the video image information to detect and track the target in real time,based on the lightweight deep learning target detection network SSD_Mobilenetv1,by improving its network structure,using the more fine-grained feature map to participate in position regression and classification to integrate the context information of the network and introduce the inverse,the residual module improves the ability of the network to extract features.The experiment shows that the real-time detection speed is guaranteed and the detection accuracy is improved,and the training and verification on KITTI data set have achieved good results.