Optimization of Machine Learning Models for Big Data Applications

International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |

TITLE	Optimization of Machine Learning Models for Big Data Applications
ABSTRACT	We present a comprehensive analysis of scalable optimization techniques for machine learning (ML) on big data. We review recent (last 5 years) research on optimization algorithms – including stochastic first-order methods (SGD and momentum), adaptive methods (AdaGrad/RMSProp, Adam and variants, LARS/LAMB), second-order and quasi-Newton techniques (Newton, Hessian-free, K-FAC, Shampoo), and variance-reduction methods (SVRG, SAGA, etc.) – and discuss their convergence and practical trade-offs[1][2]. We also cover distributed and federated optimization algorithms: data-, model-, and pipeline-parallel training; asynchrony (e.g. Hogwild!); communication-efficient methods (gradient quantization, sparsification)[3][4]; and federated averaging (FedAvg) vs. distributed synchronous SGD[5][6]. We identify benchmark “big data” datasets across vision (ImageNet: 14M images[7]), text (English Wikipedia: 3.9M articles, 2.24B tokens[8]; Common Crawl: billions of pages, ~400TiB/month[9]), recommendation (Criteo CTR: 45M to billions of examples[10]), and graphs (OGB-LSC: MAG240M with 244M nodes, 1.7B edges[11]). We detail scalable frameworks (TensorFlow, PyTorch DDP, Horovod, Ray, Spark MLlib, etc.), summarizing their parallelism models, strengths, and limitations[12][13]. We propose experimental designs comparing optimizers on large-scale tasks, with metrics (accuracy, throughput, time-to-accuracy[6], communication cost), hardware setup (GPU/TPU clusters), and statistical analysis plans. Identified research gaps include integrating adaptive and second-order techniques at scale, better asynchronous/federated algorithms, and hyperparameter search in distributed settings. We outline a potential paper structure, suggest high-impact ML/BigData journals (e.g., IEEE Trans. Big Data, TKDE, ICML/NeurIPS(MLSys track)), and list relevant research groups for peer review. Tables compare optimization algorithms, datasets, and frameworks; a Gantt chart (Mermaid) sketches a project timeline.
AUTHOR	PRAJWAL M DIVATAGI, PRATEEK S KUDARI, MARUTHI B PUJAR, ARCHANA K N UG Students, Dept. of CSE, Jain Institute of Technology, Davangere, Karnataka, India Assistant Professor, Dept. of CSE, Jain Institute of Technology, Davangere, Karnataka, India
VOLUME	183
DOI	DOI: 10.15680/IJIRCCE.2026.1404118
PDF	pdf/118_Optimization of Machine Learning Models for Big Data Applications.pdf
KEYWORDS
References	1. (SGD/Adaptive: Bottou (SGD), Duchi et al. JMLR 2011 (AdaGrad), Kingma & Ba ICLR 2015 (Adam)[1], Dozat (2016 NIPS; NAdam)[15]. 2. Large-Batch: Goyal et al. (2017) on 256-GPU ResNet (LARS)[2], You et al. (2019 ICLR) LAMB[20]. 3. Second-Order: Martens & Grosse (ICML 2015, K-FAC)[21], Gupta et al. (ICLR 2018, Shampoo)[22]. 4. Variance Reduction: Johnson & Zhang (NIPS 2013, SVRG). 5. Distributed Training: Dean et al. (NIPS 2012, DistBelief), Niu et al. (NIPS 2011, Hogwild!), Sergeev & Balso (2018, Horovod)[13]. 6. Communication: Alistarh et al. (ICML 2017, QSGD)[3][4]. 7. Hyperparameter: Li et al. (ICLR 2018, Hyperband)[29], Jaderberg et al. (NeurIPS 2017, PBT)[27][28]. 8. Frameworks: Abadi et al. (2016, TensorFlow)[12], Paszke et al. (2019, PyTorch), Sergeev & Balso (2018, Horovod)[13], Moritz et al. (OSDI 2018, Ray)[32]. 9. Benchmarks: Deng et al. (2009, ImageNet paper); Common Crawl docs; Tullie Murrell 2025 (overview of Criteo)[10]; Hu et al. (NeurIPS 2020, OGB paper)[11]. 10. Time-to-Accuracy: Coleman et al. (arXiv 2019, DAWNBENCH)[6][26]. 11. [1] [2] [3] [4] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] Large-Scale Deep Learning Optimizations: A Comprehensive Surveyhttps://arxiv.org/pdf/2111.00856 12. [5] [12] [13] [25] [30] [33] [34] [35] The Landscape of Modern Machine Learning: A Review of Machine, Distributed and Federated Learninghttps://arxiv.org/html/2312.03120v1 13. [6] [26] [1806.01427] Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark https://arxiv.org/abs/1806.01427 14. [7] ImageNet – Wikipedia https://en.wikipedia.org/wiki/ImageNet 15. [8] Experiments on the English Wikipedia — gensimhttps://radimrehurek.com/gensim/wiki.html 16. [9] Inside Common Crawl: The Dataset Behind AI Models (and Its Real World Limits) - DEV Community https://dev.to/extractdata/inside-common-crawl-the-dataset-behind-ai-models-and-its-real-world-limits-2eo2 17. [10] Criteo Dataset: Tackling Large-Scale Click-Through Rate Prediction \| Shaped https://www.shaped.ai/blog/criteo-dataset-tackling-large-scale-click-through-rate-prediction 18. [11] Overview of OGB-LSC \| Open Graph Benchmark https://ogb.stanford.edu/docs/lsc/ 19. [24] Stochastic variance reduction - Wikipediahttps://en.wikipedia.org/wiki/Stochastic_variance_reduction 20. [27] [28] [1711.09846] Population Based Training of Neural Networks https://arxiv.org/abs/1711.09846 21. [29] vldb.org https://www.vldb.org/pvldb/vol15/p1256-li.pdf 22. [31] [32] [36] [37] [1712.05889] Ray: A Distributed Framework for Emerging AI Applications https://ar5iv.labs.arxiv.org/html/1712.05889

About Us

The primary objective of IJIRCCE is to serve as an international scholarly platform that enables researchers, innovators, students, and research scholars to disseminate their research findings and technological advancements to a global academic audience.

About Us

GET IN TOUCH

Useful Links

ARTICLES

About Us

GET IN TOUCH

Useful Links